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(^ , Abstract 

Algorithms are presented for evaluating gradients and Hessians of logarithmic barrier func- 
jrt I tions for two types of convex cones: the cone of positive semidefinite matrices with a given 

sparsity pattern, and its dual cone, the cone of sparse matrices with the same pattern that have 
a positive semidefinite completion. Efficient large-scale algorithms for evaluating these barriers 
CO ■ and their derivatives are important in interior-point methods for nonsymmetric conic formula- 

tions of sparse semidefinite programs. The algorithms are based on the multifrontal method for 
sparse Cholesky factorization. 

u 
o 

C^ ' 1.1 Log-det barrier for sparse matrices 



1 Introduction 



We discuss algorithms for evaluating the gradient and Hessian of the 'log-det' barrier 

f{X) = -logdetX 
> ■ 

^_S I when X is large, sparse, and positive definite. We take / as a function from Sy to R, where V 

j~^ ' is the filled sparsity pattern of the Cholesky factor of X, and Sy denotes the set of symmetric 

CN ■ matrices with sparsity pattern V. With this convention, and for the standard trace inner product 

cn ! A* B = tr{AB) of symmetric matrices, the gradient of f at X is 
O 
^ : V/(X) = -P(X-i) (1) 

where V denotes projection on V, i.e., V{A)ij = Aij if the pattern V has a nonzero in position 
k> ' i,j and V{A)ij = otherwise. The algorithms presented in this paper exploit properties of filled 

$H ■ sparsity patterns (which are also known as chordal or triangulated patterns) to compute the gradient 

directly from the Cholesky factor of X, without calculating the rest of the inverse. 
The Hessian of /, interpreted as a function from Sy to R, is defined by 



V'fiX)[Y] = jVfiX + tY) 



= r{x-'Yx-') yy e s^. (2) 

t=0 



We are interested in efficient methods for evaluating this expression, possibly for multiple matrices 
Y simultaneously, without computing the entire inverse of X or the products X~^YX~^. We also 
discuss methods for evaluating the inverse Hessian \7'^f{X)~^[Y]. 
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The function / has an important role as a logarithmic barrier function for the convex cone 

S^_+ = {X e S^ I X ^ 0}, 
the cone of positive semidefinite matrices with sparsity pattern V. 

1.2 Conjugate barrier 

Equally important is the corresponding dual barrier function 

MS)= sup (-S»X-fiX)). (3) 

This is the conjugate or Legendre transform of / applied to —S. The function /* is a logarithmic 
barrier function for the dual cone of Sy,, which contains the symmetric matrices with sparsity 
pattern V that have a positive semidefinite completion: 



'y,c 



V{Sl) = {ViX) I X ^ 0}. 



The dual barrier f*{S) can be computed as f*{S) = logdetX — n where X is the maximizer in the 
definition of /*, i.e., the solution of the nonlinear equation V/(X) = —S or 

ViX-'^) = S (4) 

with variable X £ Sy. The gradient and Hessian of /* at S also follow from the maximizer X by 
applying standard properties of Legendre transforms: 

vMS) = -x, v^MS) = v^f{xr\ (5) 

For general sparsity patterns V, the maximizer X in ([3]) needs to be computed by iterative methods. 
For filled patterns V, however, efficient direct algorithms exist. The algorithms discussed in this 
paper compute a Cholesky factorization of X, given the matrix S, using a finite recursion that is 
very similar and comparable in cost to a Cholesky factorization. 

There is an interesting connection between the dual barrier f^ and the maximum determinant 
positive definite completion problem, which has been extensively studied in linear algebra [ GJSW841 
ILauOlj . The optimization problem in ([3]) is the Lagrange dual of the convex optimization problem 

minimize — log det Z — n , 

subject to r{Z) = S, ^ ' 

with variable Z G S". The primal and dual optimal solutions X and Z are related by the optimality 
condition Z~^ = X. The solution X of ([!]) is therefore the inverse of the maximum determinant 
positive definite completion of S. 

1.3 Applications 

Efficient gradient and Hessian evaluations for / and /* are critical to the performance of interior- 
point methods for conic optimization problems associated with the cones Sy^ and Sy^ |ADV10| , 



ISV04J . Consider the pair of primal and dual cone linear programs (LPs) 

minimize (Fx maximize —B • S (7) 

subjectto f:^iAi + X = B subjectto A. . 5 + c. = 0, i = l,...,m 

X e S^^+ 

with variables x G R™", X, 5 e Sy, and problem parameters Aj, S e Sy, c € R™. Cone LPs of this 
type have been studied in sparse semidefinite programming with the goal of exploiting aggregate 
sparsity in the coefficient matrices Ai and B JSV041 IBur031 lADVlO) . Matrix completion techniques 
and chordal sparse matrix properties were first applied to semidefinite programming algorithms by 
Fukuda et al. [FKMNOO] in a sparse implementation of the HRVW/KSH/M primal-dual algorithm. 

A primal barrier method for ([7]) requires at each iteration the evaluation of the gradient V/(X) 
and the solution of a positive definite equation HAy = g with coefficients Hij = Ai* (V'^ f {X)[Aj]) . 
Efficient techniques for evaluating the Hessian V'^f{X)[Aj] are therefore important in large-scale 
implementations. A dual barrier method for the cone programs involves evaluations of the gradient 
Vf^:{S) and a set of linear equations HAy = g with coefficients Hij = Ai» (V^/*(5)~^[Aj]). From 
the relations ([5]), we see that this requires the inverse X of the maximum determinant positive 
definite matrix completion of S and the evaluation of the Hessian Sl"^ f {X)[Aj] at X. We refer the 
reader to [ADVlOt ISV04J for more details. 

The problem of computing a projected inverse 'P{X~^) (and, more generally, computing a 
subset of the entries of the inverse of a sparse positive definite matrix) has also been studied in 
statistics |GP80t I ADR"*" 10 . Efficient algorithms for computing the gradient of / are important in 



maximum likelihood estimation problems involving Gaussian distributions, for example, in sparse 
inverse covariance selection |DVR08| . Consider, for example, the covariance selection problem with 
1-norm penalty 

minimize C • X — logdetX -|- yo||X||i, 

which has been studied by several authors [HLPLOOl IBEdOHi IFHTOHI IdBEnSl ISMCnOI iLTlnj . In 
this problem, X is the inverse covariance matrix of a Gaussian random variable and C is a sample 
covariance. The first two terms in the objective form the negative log-likelihood function of X (up 
to constants), and the penalty term is added to promote sparsity in the solution X. In problems 
of high dimension, it may be unrealistic and impractical to regard X as a dense matrix variable. 
Instead, one can start with a partially specified pattern based on prior knowledge, and use the 
penalized covariance selection to identify additional zeros. The problem can then be posed as an 
optimization problem over Sy, where V is the known sparsity pattern. This greatly simplifies 
the cost of calculating the gradient of the smooth terms in the objective and makes it possible to 
solve very large covariance selection problems using first-order methods that require the gradient 
of — log det X at each iteration. 

Other applications include matrix approximation problems with sparse positive definite matrices 
as variables, for example, computing sparse quasi- Newton updates ^Fle95tlYam08| . 

1.4 Related work and outline of the paper 

We refer to the algorithms in this paper as multifrontal and supernodal because of their resemblance 
to multifrontal and supernodal multifrontal algorithms for Cholesky factorization |DR831 lLiu92] . 



The multifrontal Cholesky factorization is reviewed in Section [3] and a supernodal variant, formu- 
lated in terms of clique trees, is described in Section [71 

In Section U we introduce multifrontal algorithms for computing the gradients of / and /=„ . 
Similar algorithms for evaluating V/ are discussed in |CD95[ I ADR"*" 10] . The close connection 
between the problem of computing the gradient of f{X) = — logdet X and a Cholesky factorization 
is easily understood from the chain rule of differentiation. A practical method for computing f{X) 
will calculate a sparse Cholesky factorization of X, for example, X = LDL^ with L unit lower 
triangular and D diagonal, and then evaluate f{X) = — ^^logDjj. By applying the chain rule 
to a sparse factorization algorithm, the gradient of f{X) can be evaluated at essentially the cost 
of the factorization itself. Moreover, the differentiation can be automated using reverse automatic 
differentiation software [GW08] . Although the algorithm in Section 14.11 can be obtained from the 
chain rule, we give a straightforward direct derivation. This not only simplifies the notation and 
description of the algorithm, it also helps reduce the memory requirements, which can be high in 
a straightforward application of reverse differentiation because of the large number of intermediate 
auxiliary matrices (update and frontal matrices) generated during the factorization. 

As mentioned earlier, evaluating the gradient V/*(5) is equivalent to inverting the mapping 
X I— )• V/(X), and an algorithm for evaluating V/=k is therefore easily derived from the algorithm 
for Vf (see Section |12|) . 

In Section \5\ we examine the problem of computing the Hessians and inverse Hessians of / 
and /*, and more specifically, the problem of evaluating expressions of the form V^/(X)[C/] and 
V^/*(5)[y]. Again, the algorithms follow conceptually from the chain rule and can be obtained 
by applying automatic differentiation techniques. An explicit description allows us to optimize the 
efficiency and memory requirements. This is particularly important in the case of sparse arguments 
U, V. As an important by-product, we define a factorization V^/(X) = T^'^'^J o 7^ and present 
efficient methods for evaluating the factors 7^ and T^'^'^J separately. 

The methods presented in the paper are closely related to the barrier evaluation algorithms 
of |DVR08t IDV09| . These algorithms were formulated as recursions over clique trees and can 
be interpreted as supernodal versions of the multifrontal algorithms presented in Sections [3H5l 
We elaborate on the connections in Sections [6] and [71 In contrast to the clique tree methods 
in [DVROSl IDV09J , the algorithms described here work with data structures that are widely used 
in sparse Cholesky factorization algorithms (namely, the compressed column storage format and 
elimination trees). As a result they are simpler to implement and more readily combined with 
techniques from the recent literature on sparse matrix factorization algorithms. Some possible 
further improvements in this direction are mentioned in the conclusions (Section [8]). 

1.5 Notation 

We identify a symmetric sparsity pattern V with the positions of its lower-triangular nonzeros. In 
other words, a symmetric matrix X has sparsity pattern V if Xij = Xji = for {i,j) V. The 
entries Xij and Xji for (i, j) G V are treated as (structurally) nonzero, although they are allowed to 
be numerically zero. A lower-triangular martrix L has sparsity pattern V if the symmetric matrix 
L + L has sparsity pattern V. 

An index set is a sorted subset of the integers {1, 2, . . . , n}. The number of elements in the 
index set I is denoted |/| and its kth element I{k). If I and J are two index sets with J C /, we 



define an |/| x \J\ matrix Ejj with entries 

' 1 /(O = Jij) 



^■' ' otherwise. 

This notation will be used in expressions EJjBEjj and EjjCEJj, which have the following meaning: 
if A is a symmetric matrix of order n and B = An is the principal submatrix indexed by /, then 
the matrix EJjBEjj is equal to Ajj, the principal submatrix indexed by J . The adjoint operation 
B = EjjCEJj, applied to a symmetric matrix C of order \J\, can be interpreted as first embedding 
C as the \J\ x | J|-block Ajj of an otherwise zero n x n matrix A, and then extracting the |/| x |/| 
submatrix B = Ajj. 

2 Elimination trees 

This section provides some background on sparse matrices and elimination trees |Liu90t IDavOG] . 
We define a Cholesky factorization as a factorization 

X = LDL^ 

with L unit lower-triangular and D positive diagonal. The sparsity pattern V of the Cholesky 
factor L has the following fundamental property: 

i>j>k, {i,k)ev, ij,k)ev =^ {i,j)ev. (8) 

(In other words, excluding accidental cancellation, Lj^ 7^ and Ljjt 7^ implies Lij ^ 0.) This 
property distinguishes a Cholesky factor from a general sparse lower-triangular matrix. In the 
example in Figure [H the presence of nonzeros in positions (5,3) and (15,3) implies that the entry 
(15,5) is nonzero. The nonzeros in positions (9,5), (15,5), and (16,5) imply that the entries in 
positions (15,9), (16,9), and (16,15) are nonzero. 

We use the notation I^ to denote the sorted set of row indices of the nonzero entries below the 
diagonal in column k of L. We also define Jk = Ik^ {k}- The property ([8]) implies that Ik defines a 
complete subgraph of the filled graph, i.e., the matrix Lj^j^, is a dense lower-triangular matrix. For 
example, it can be verified that the submatrix indexed by J5 = {5,9, 15, 16} in Figure [His dense. 
The number of nonzeros below the diagonal in column /c, i.e., the cardinality |/fc| of I^, is called 
the monotone degree of vertex k. 

The elimination tree (etree) is defined in terms of the sparsity pattern of the factor L as follows. 
It is a tree (or a forest if L is reducible) with n vertices, labeled 1 to n. The parent of vertex k is 
the row index j of the first nonzero below the diagonal of column k of L, i.e., the vertex min/fc. 
As a consequence, each vertex has a lower index than its parent, so the vertices in the elimination 
tree are numbered in a topological ordering. An example is shown in Figured] 

We will use two important properties of elimination trees. 

Theorem 1 [Liu901 theorem 3.1] If j G I^, then j is an ancestor of k in the elimination tree. 
Note that the converse does not hold. 

Theorem 2 [Liu92l theorem 3.1] If vertex j is an ancestor of vertex k in the elimination tree, then 
the nonzero structure of (Lj^, Lj^ik, . . . , L^k) is contained in the structure of {Ljj,Lj^ij, . . . , Lnj). 
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Figure 1: Filled pattern and elimination tree. The numbers next to vertices in the elimination tree 
are the monotone degrees \I ki- 
ln the notation for the column structure introduced above, this theorem asserts that if j is an 
ancestor of k, then 

hf^{j,j + l,...,n} C Jj. 

In particular, if j is the parent of k (hence, by definition, j is the first element of Ik and therefore 
4 ^ {j,j + l,...,n}), then 

h^Jj, |4|<|/j| + l. (9) 

By applying these inequalities recursively to a path ii, i2, ■ ■ ■ , ir from a vertex ii in the elimination 
tree to one of its ancestors ir , we obtain a chain of inclusions 



Ih ^ Ji2 ^ {^2}UJi3 C ••• C {i2,i3,...,ir-l}U Ji,, 



and inequalities 



\Ih\ < |/*2l + l < \Iis\ + 2 < 



< |/iJ+r-l. 



(10) 



(11) 



This can be verified in the elimination tree in Figure [TJ As we move along a path from a leaf vertex 
to the root of the tree, the monotone degrees can increase or decrease, but they never decrease by 
more than one per step. 



3 Cholesky factorization and multiplication 



In this section, we review the multifrontal algorithm for Cholesky factorization |DR831lLiu92] . We 
then describe a similar algorithm for the related problem of computing a matrix, given its Cholesky 
factors. 



3.1 Cholesky factorization 

Recall that we define the Cholesky factorization as a decomposition X = LDL , with D positive 
diagonal and L unit lower-triangular. The formulas for L and D are easily derived from the equation 



X = LDL^ . The Jj x Jj block of the factorization is 



X, 






^3J 

k<j 
The first column of the equation is 







Ljk 


T 

+ Djj 


1 

. ^Ij3 . 




1 



7 ^ L>kkLjk 
k<j 



Ljk 

Lt,u 



+ D 



J3 



2_^L)kk 
k>j 



1 



Li^k 



Li^k 



(12) 



The multifrontal algorithm takes advantage of properties of the elimination tree associated with L 
to compute the sum on the right-hand side. First, we recall that Ljk ^ only if /c is a descendant 
of j in the elimination tree (Theorem [T]) . The sum in (I12p can therefore be replaced by a sum over 
the proper descendants of j. The set of proper descendants of vertex j is 

Tj\{j}= U ^- 

JGch{j) 

where Tj is the subtree of the elimination tree rooted at vertex j and ch(j) are the children of 
vertex j. The equation (J12p then becomes 






^ ^ DkkLjk 



Ljk 
Lj.k 



D 



33 



1 
^Ij3 



(13) 



(14) 



Second, suppose that for each vertex i in the elimination tree, we define a dense matrix 

f^ = - 2^ DkhLi-kLi^k- 

k£T, 

The matrix Ui is called the update matrix for vertex i. Using the definition of Ejj in Section [1.51 
we can write 



Ljk 
Lj,k 



Ljk 
Lj,k 



k&T, 

This follows from Theorem [21 if i G ch(j) and k £ Ti, then 



Ej.hUiEjr^. 



(15) 



4n{i,j + i,...,n} c/iC Jj 

and therefore Ljk = Ej.j.Lj.k- (The multiplication with Ej.j. copies the entries Lj.k to the correct 
position in Ljk and inserts zeros for the other entries.) Adding the first columns of Eji-UiEj^, 
for all i G ch(j) therefore gives the first term on the right-hand side of p3p . 

Thus, by combining the nonzero lower-triangular entries in column j of X (the left-hand side 
of (jlSp ) and the update matrices of the children of vertex j (to assemble the sum on the right-hand 
side), we collect all the information needed to compute Djj and Lj.j from (|13p . 



Furthermore, the same equation (J13p shows how the update matrix Uj for vertex j can be 
calculated. This is clearer if we rewrite (I13p in matrix form using (IISD as 






H, 



iGch(j) 



D 



J3 



1 



1 



1 



D 



30 







u, 





[/,- 



1 Lf., 
/ 



(16) 

(17) 



The first column of this equation is identical to ()13p . The 2,2 block follows from the definition of 
Uj and the identity (fT5]) . The matrix on the left-hand side of p!7j) is called the jth frontal matrix. 
The equation (llTh shows that once the frontal matrix has been assembled, we can compute Djj, 
Lj.j, and Uj by a pivot step. 

The resulting algorithm to compute L, D, given a positive definite X, is summarized below. 



Algorithm 3.1. Cholesky factorization. 

Input. A positive definite matrix X. 

Output. The factors L, D in the Cholesky factorization X = LDL^. 

Algorithm. Iterate over j E {1, . . . ,n} in topological order (i.e., visiting each vertex of 
the elimination tree before its parent). For each j, form the frontal matrix 



-^11 -^21 
-^21 -^22 



X 



33 ^I, 



j3 



jech(j) 



(18) 



and calculate Djj, the jth column of L, and the jth update matrix Uj from 



D 



33 



Fi 



111 



^Ij3 



1 



D 



■F, 



21, 



Uj - F22 - DjjLj^jLj^j. 



(19) 



33 



In a practical implementation, with the lower-triangular part of X stored in a sparse format 
(typically, the compressed column structure or CCS; see |Dav0 6] ) . one can overwrite Xjj with Djj 
and Xj.j with Lj.j after cycle j. The auxiliary matrices Fj and U are stored as dense matrices 
(either as two separate arrays or by letting Uj overwrite the 2, 2 block of Fj). The main step in the 
algorithm is the level-2 BLAS operation in the calculation of Uj in ()19p |DCHH88] . The frontal 
matrix Fj can be discarded after the vertex j has been processed. The update matrix Uj can be 
discarded after the parent of vertex j has been processed. 



3.2 Cholesky multiplication 



The equation (J16p also shows how the jth column of X can be computed from Djj, column j 
of L, and the update matrices for the children of vertex j. This yields an algorithm for the 
inverse operation of the Cholesky factorization, i.e., the matrix multiplication LDL^ , which will 
be important in Section HI 



Algorithm 3.2. Cholesky product. 



Input. Cholesky factors L, D. 

Output. The matrix X = LDL^ . 

Algorithm. Iterate over j G {1, . . . ,n} in topological order. For each j, calculate Xjj, 



Xj.j, and Uj from 



^jj Xjjj 



Xi. 



33 



-Ui 



Dn 



1 



1 



-iT 



E J^J.uU^^lu- (20) 

iGch(j) 



The matrices Ui can be deleted after the parent of vertex i has been processed. In a practical 
implementation, we compute the left-hand side of (|2Up as a dense matrix, via a level-2 BLAS 
operation for the outer-product on the right-hand side. Then Xjj and Xj.j are copied to the CCS 
structure for X. 

4 Gradients 



In this section we describe 'multifrontal' algorithms for evaluating the gradients of the barrier and 
the dual barrier. 

Recall that the primal gradient is defined as V/(x) = — P(X^^), where V denotes projection 
on the filled pattern V of X. It is straightforward to show that the entries of the projected 
inverse V^X^"^) can be computed directly from the Cholesky factors L, D without calculating any 
entries of X^^ outside V (see Section 14. ip . This observation is the basis of several algorithms 
published in the literature. The algorithm we describe here is equivalent to the inverse multifrontal 
algorithm in [CD95] . but we give a different and shorter derivation. The projected inverse algorithm 
in |DVR08j can be viewed as a supernodal variant of the algorithm discussed here (see the discussion 
in Section [7j). Another closely related algorithm is described by Amestoy et al. jADR"*" 10] . who 
consider the problem of computing a few entries of the inverse of a large sparse matrix. 

Evaluation of the dual gradient corresponds to the inverse operation, i.e., the problem of solving 
the nonlinear equation V^X^^) = S with variable X G Sy. A multifrontal algorithm for this 
problem is derived in Section 14. 2[ 



4.1 Primal gradient 

Define Z = X~^ and S = V{Z). We are interested in an efficient method for computing S from 
the Cholesky factors L and D of X. The matrix Z = L~^D~^L~^ satisfies 



ZL = L-'^D-^. 



The Jj X j block of this equation only involves entries of Z in the projection S = V{Z): 



^jj ^li 









Hi 







(21) 



The vertices of Ij are ancestors of vertex j (Theorem [T]). Therefore, if we calculate the columns of 
S following a reverse topological order of the vertices of the elimination tree, then the matrix Sij. 
is known when we arrive at column j. Given Sj-j., it is easy to compute Sjj and Sjj from (|2ip : 



Si. 



il 



-Si-uLi, 



j-'j -'jJ' 



S^ 



3] 



D 



JO 



Si^jLijj. 



(22) 



Accessing the vectors Lj j and Sj j is easy if L and the lower-triangular part of S are stored in a 
CCS data structure. Retrieving Sj-i^ from the CCS representation of S can be avoided by using 
an idea similar to the multifrontal Cholesky algorithm. For each vertex j of the elimination tree, 
we define a dense 'update matrix' 

It follows from the properties of the elimination tree (theorem [2] and equation ([9])) and the definition 
of Ej.j^ that if i is a child of vertex j, then 



y^ = Eiu 






Vj 



^J,u- 



By using this formula to propagate Vi, we obtain a 'multifrontal' algorithm for computing 'P{X 



-1^ 



Algorithm 4.1. Projected inverse. 

Input. The Cholesky factors L, D of a positive definite matrix X. 

Output. The projected inverse S = V{X~^) = — V/(X). 

Algorithm. Iterate over j = {1, 2, . . . , n} in reverse topological order (i.e., visiting each 
vertex before its children). For each j, calculate Sjj and Sj^j from 



Sijj = -VjLijj, 



S^ 



33 



D 



33 



Si^jLijj, 



and compute the update matrices 



Vi = E 



J,h 



iJ j 4 O 



S 



■'33 
'h3 



Ij3 



V, 



Ej.i^, i€di{j). 



(23) 



(24) 



The matrix Vj can be discarded after cycle j, and Sij and Sjj can overwrite Lj.j and Djj in a CCS 
data structure. As in Algorithm 13. H the algorithm involves operations with the dense matrices Vj 
and 



Q.. qT 

^33 ^Ijj 



Sh3 Siji, 
The main calculation is a level-2 BLAS operation (the matrix- vector product VjLj.j) 



10 



4.2 Dual gradient 

As mentioned in the introduction, the solution X € Sy of the equation V{X~ 



= S \s the inverse 

of the maximum determinant positive definite completion of S, and it is also the negative of Wf^,{S). 
The Cholesky factors of X can be computed by solving for Lj 
algorithm. 



J and Djj from ()22p as in the following 



Algorithm 4.2. Matrix completion. 

Input. A matrix S G Sy that has a positive definite completion. 

Output. The Cholesky factors L, D of X = — V/*(S), i.e., the positive definite matrix 
X E S^ that satisfies V{X-^) = S. 

Algorithm. Iterate over j G {1, 2, . . . , n} in reverse topological order. For each j, com- 
pute Djj and the jth column of L from 



^hi 



and compute the update matrices 



(25) 



Vi = E 



JjH 



iJ j 4 O 



■'n 



hi 



V. 



Ej^h, i G ch(j). 



(26) 



The update matrix Vj can be discarded after cycle j, and Lij and Djj can overwrite Sjj and 
Sjj in a CCS data structure. 

The cost of Algorithm 14.21 is higher than that of Algorithm 14.11 because step (I25p involves the 
solution of an equation with Vj as coefficient matrix, whereas (|24p only requires a multiplication. 
It is therefore of interest to propagate a factorization of Vj instead of the matrix itself, and to 
replace ()26p by an efficient method for computing a factorization of Vi, given a factorization of Vj. 
This idea can be implemented as follows. We use a factorization of the form Vj = RjRJ , with Rj 
upper triangular of order \Ij\. We need to replace ()26p with an efficient method for computing Ri 
from Rj. The matrix Ejj. can be partitioned as 



^Jil^ 



1 

E 



where E = Ei.i.\fj\. (This follows from the fact that j is the first element of Jj by definition of Jj, 
and J is also the first element of /j if j is the parent of vertex i. As a consequence, the 1,1 element 
of matrix Ej^j^ is equal to one). Prom (j26|) . 



V = E 



J,h 






oT 

V, 



Jjli 



a 5^ 
b CC^ 



where 



5. 



JJ' 



E'^Sl 



J' 



c 



E Rj. 



The matrix C is obtained from the upper triangular matrix Rj by deleting the rows in Jj \ li- It 
can be reduced to square upper triangular form by writing it as C = RQ^ with R upper triangular 



11 



and Q orthogonal (a product of Householder transformations |GV96j ). Then the triangular factor 
in Vi = RiRf is given by 

R 

This is summarized below. 



Rj 



Algorithm 4.3. Matrix completion with factored update matrices. 

Input. A matrix 5 G Sy that has a positive definite completion. 

Output. The Cholesky factors L, D of X = — V/*(S'), i.e., of the positive definite matrix 
A G S^ that satisfies V{X~^) = S. 

Algorithm. Iterate over j G {1, 2, . . . , n} in reverse topological order. For each j, 

• compute Djj and the j'th column of L from 



L/,,- - -Rj R- Si^j, Djj - {Sjj + Sj^jLi^j) 



for i G ch(j), compute a factorization 



E 



ijMij} 



Rj — RQ 



with R upper triangular and Q orthogonal, and compute 



Rj 



{s,,-\\R-^s,^,\\iy'' {R-^sj^,y 

R 



(27) 



(28) 



(29) 



The cost of step (p7|l is order |Ijp for the forward and back substitutions, and the cost of ([29]) 
is proportional to |Ijp (for the computation of R~^Si-j). The cost of the reduction to triangular 
form in ([28|) is difficult to quantify because it depends on the number of rows in Rj that are deleted 
in the multiplication E Rj and on their positions. However the total cost is usually much less than 
the cost of computing Ri from scratch, as shown by the experiments in the next section. 



4.3 Numerical results 

In this section, we give experimental results with Algorithms 14.1] 14.2] and 14.31 The algorithms 
were implemented in Python 2.7, using the Python library CVXOPT version 1.1.3 [DVlOj and its 
interfaces to LAPACK and BLAy^l for the sparse and dense matrix computations. Some critical 
code segments were implemented in C (such as the Householder updates in Algorithm 14.31 and the 
"extend-add" operation F := F + EjjUEjj and its adjoint.) The experiments were conducted on 
an Intel Q6600 CPU (2.4 GHz Core 2 Quad) computer with 4 GB memory, running Ubuntu 11.04. 
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w + 1 




© 



Figure 2: Lower-triangular band pattern with bandwidth w and the corresponding eUmination 
tree. The vertices 1, . . . , n — w have monotone degree w. The vertices k = n — w + 1, . . . ,n have 
monotone degree n — k. 



w 




n-\ 




Figure 3: Lower-triangular arrow pattern with width w and the corresponding elimination tree. 
The vertices 1, . . . , n — w have monotone degree w. The vertices fe = n — t(;-|-l,...,n have monotone 
degree n — k. 
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Algorithm n 
Algorithm O 
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w 



w' 



Algorithm IXTl 
Algorithm m 
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W 



Figure 4: Cost of computing the primal gradient (Algorithm 14. ip and the dual gradient (Algo- 
rithms 112] and [4]3]) for (a) band patterns and (b) arrow patterns of order n = 2000, as a function 
of the pattern width w. 



4.3.1 Band and arrow patterns 

Band and arrow patterns are two basic sparsity patterns for which the complexity of the algorithms 
is easy to analyze. We assume n ^ w, where w is the bandwidth or blockwidth (see Figures [2] 
and [3]) . It is easy to see that the cost of a Cholesky factorization of a matrix with one of these two 
patterns is 0{nw'^). 

Step ()23p of Algorithm 14. II involves matrix- vector multiplications of order \Ij\. The complexity 
of the algorithm is dominated by the total cost of these products, i.e., 0{nw'^). This is similar to 
the cost of a Cholesky factorization. Step (j25p of Algorithm 14.21 on the other hand requires solving 



a dense positive definite system of order \Ij\. The total complexity is therefore 0{nw^ 
In Algorithm 14. 3| the cost of step ()27p is reduced to 0{w'^) per iteration. For j = 



n — w + 1, 

. . . , n, we have E = I in step (I24p . so C is upper triangular and we only need to compute R~^b. 
For the other vertices in the elimination tree (j = 1, . . . ,n — w), C = E^Rj is the matrix Rj with 
one row deleted: the last row in the case of a band pattern, the first row in the case of an arrow 
pattern. The cost of reducing C to triangular form is therefore zero in the case of an arrow pattern 
and O(w^) in the case of a band pattern. In either case, the total cost of Algorithm 14.31 is reduced 
to 0{nw'^). 

In Figure m we compare the CPU times of the three algorithms as a function of the width w. As 
can be seen, the cost of computing the dual gradient using Algorithm 14.31 is comparable to the cost 
of the primal gradient using Algorithm 14. H and the cost of the two algorithms grows roughly as 
w"^ for fixed n. Notice the small gap between the cost of Algorithms 14.31 and 14.11 for band patterns. 
This gap refiects the cost of reducing C to triangular form in Algorithm 14.31 



4.3.2 General sparse patterns 

In the second experiment, we use a benchmark set of large symmetric sparsity patterns from the 
University of Florida Sparse Matrix Collection |Dav09| . The AMD ordering was used to compute 



^We link against the single-threaded reference implementations of BLAS and LAPACK in Ubuntu. 
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Figure 5: Scatter plot of n versus (a) the density {\V\/{n{n + \)/2)) and (b) the number of nonzeros 
in L for the 128 test problems. Each dot represents a problem from the University of Florida Sparse 
Matrix collection. 



filled patterns. To prevent out-of-core computations, we restrict the experiment to matrices for 
which the filled pattern occupies less than 250 MB of memory. The set of test problems includes 
128 sparsity patterns, with n ranging from 500 to 204316 and with \V\ between 817 and 15,894,180. 
A scatter plot of the number of nonzeros and the density of the test problems versus the dimension 
n is shown in Figure [H 

Figure [6] shows the CPU times for Algorithm 14. II (primal gradient or projected inverse) and 14.31 
(dual gradient or completion), and for Algorithms 14. 31 and 14. 21 (completion with and without House- 
holder updates, respectively). Each dot represents one of the sparsity patterns in the test set. The 
results indicate that in practice, on this set of realistic sparsity patterns, the costs of computing 
the primal and dual gradients are comparable. 



5 Hessian 

The Hessian H = V^f{X) of / at AT is defined as 

■H{Y) =V{X-'^YX-'^) = 



-P(.Y + *r)- 



i=0 



A method for evaluating T-L(Y) can therefore be found by differentiating the formulas for evaluating 
the gradient —V{X~^). As we have seen, V{X~^) is obtained in two stages. First the Cholesky fac- 
tors L, D of X are computed, column by column, following a topological ordering of the elimination 
tree (Algorithm 13. ip . Then the projected inverse V{X~^) is computed from L and D, column by 
column, in reverse topological order (Algorithm 14. ip . Linearizing the two algorithms will provide 
an algorithm for 'H(l'). We give the details in Section [5.21 

We also consider the problem of evaluating Y = 'H~'^{T), i.e., solving the linear equation 

V{X-^YX-^) = T 
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Algorithm 14. 11 projected inverse (sec 



10- 
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Algorithm 14. 21 completion (sec.) 
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Figure 6: CPU times for primal and dual gradient algorithms for a test set of 128 sparsity patterns: 
(a) shows the cost of Algorithm 14.11 (primal gradient or projected inverse) versus the cost of Al- 
gorithm |13] (dual gradient or matrix completion) for different sparsity patterns; (b) compares the 
cost of computing the dual gradient via Algorithms 14.31 and 14.21 i. e.. with and without the update 
technique described in Section 14. 2[ 



for Y G Sy, given T £ Sy. An algorithm for this problem can be formulated by inverting the calcu- 
lation oi'H{Y) or, alternatively, by linearizing the algorithms for matrix completion (Algorithms 14.21 
and 14. 3p and the Cholesky product (Algorithm 13. 2p ; see Section 15.31 

In applications, it is often useful to know a factorization of the Hessian as a composition of a 
mapping and its adjoint, 

?^(y) = 7^^^j(7^(y)). 

Such a factorization is discussed in Section 15.41 



5.1 Linearized Cholesky factorization and matrix completion 

Let L{t), D{t) be the matrices in the factorization X{t) = X + tY = L{t)D{t)L{tY and let Ui{t) 
be the ith update matrix in the multifrontal factorization algorithm for X{t), i.e., 

^iW = - Z] Dkk{t)Li^k{t)Lj^k{tf. 

We denote by L', D' , U[ the derivatives of L(t), D{t), Ui{t) at t = 0. These derivatives can be 
found by linearizing the equation (J16p with X replaced by X + tY , 






Dnit) 



+ t 



Y Y^ 
Yj^, 



1 



1 



igch(j) 
T r 

r 

^ Uj{t) 
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Taking the derivatives of the left- and right-hand sides at t = gives 






iech(j) 



1 
Li., I 



I \T 



D',, {D,,L'j^^) 



3] 



U', 



1 Lj. 



jJ 



/ 



(30) 



This will be the key equation for computing the linearized Cholesky factors D', L' from Y, and 
conversely, the linearized Cholesky product Y from the linearized factors D', L'. 
Similarly, we define Z{t) = {X + tY)-^, S{t) = V{Z{t)), and 

We write the derivatives of S{t) and Vi{t) at t = as —T and —¥(■ Substituting S{t), L{t), D{t) 
for S, L, L> in dH]) gives 



Si At) Si.m 



and differentiating with respect to t gives 



1 

Li,j{t) 



l/D^t) 







^. ^ 



1 
Li 



jj 



+ 



^3J 



'3J 



Si,j Si, 



^ji 



L 



iji 



-D'JD^, 



J2. "1 

'03' 3d 





Using Sij = —Sii-Lij (from (122^ ) this can be written in a more symmetric form as 



^33 -^Ijj 



1 
Li,3 I 



1 -Ll 



]3 



^'331^3 ^hh^l.3 



I \T 
'j3 



I.e., 



1 Lh 



33 



/ 



^33 ^Ijj 



1 

Ll,3 I 



^'331^3 iSl,lA,3)^ 
SijijL'i^j 



J -"J jjj 



^3 



(31) 



This equation allows us to compute T given the linearized factors D', L', and conversely, compute 
D\ L' given T. 



5.2 Hessian 

The algorithm for computing T = Ti-iY) first computes L', D' by the linearized Cholesky factor- 
ization, i.e., from (j30p . and then T from L', D' by the linearized projected inverse algorithm, i.e., 
from (I3ip . To simplify the notation, we define two matrices K, M G Sy as 



for j = 1, . . . ,n. 



M. 



£>'. 



33-jjf^ ^Ij3 = Sl.IjL'i^j, 

33 



(32) 
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Algorithm 5.1. Hessian evaluation. 

Input. A matrix Y E Sy, the Cholesky factors L, D of a positive definite matrix X G Sy, 
and the projected inverse S = V{X^^). 

Output. The matrix T = n{Y) = V{X-^YX-^) where % is the Hessian of / at X. 

Algorithm. 

1. Iterate over j e {1, 2, . . . , n} in topological order. For each j, calculate I?' ■, the 
j'th column of K, and the update matrix U'- via 













1 ■ 


i 




[ -Li,, 


I 


[ 



Y ■ Y"^ "1 
Yii 



i€ch{j) 



1 -Ll^ 
/ 



2. For j G {1, 2, . . . , n}, compute column j of M via 



M, 



K. 



1 



3J 



T#, M.,., = —Si.^Ki, 



n 






3. Iterate over j G {1, 2, . . . , n} in reverse topological order. For each j, calculate 
Tjj and T/^j from 



r.. 


^h 




" 1 


-i?. 


Ti,^ 


^' 




i 


/ 






1 
-Lli I 



and the update matrices Vl for the children of vertex j via 



^/ = Ell 






-E^J,/,, iech(j). 



The vertices j in step 2 can be ordered in any order. However, by defining Vj = Sjj. as in 
Algorithm 14.11 and using a reverse topological ordering, we can avoid having to extract 5//. from 
the CCS structure of 5. In the modified algorithm, the second step is replaced by 



M. 



K 



n 



3] 






Mi,j = -^V,Kj^„ 



Vi = E 



Jjii 



Q.. qT 

Si,j Vj 



Ej^l,, iGch(j), 



for j € {1, 2, ... , n} in a reverse topological order. 



5.3 Inverse Hessian 



To evaluate Y = T-L^^iT), we use the equation (j30p to compute the linearized Cholesky factors D', 
L' from T, and the equation (|31|) to compute Y from D' , L'. We use the same notation (|32|) as in 
the previous section. 
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Algorithm 5.2. Inverse Hessian evaluation. 



Input. A matrix T G S^, the Cholesky L, D of a positive definite matrix X G Sy, and 
the projected inverse S = V{X~^). 

Output. The matrix Y = T-l~^(T), i.e., the solution Y € Sy of the equation liiY) = 
V{X-^YX-^) = T. 

Algorithm. 

1. Iterate over j G {1, 2, . . . , n} in reverse topological order. For each j, calculate 
the jth column of M from 



Mi. V' 



1 ^h 






"^3] 



1 
Li,3 I 



and the update matrices Vl for the children of vertex j via 



Vl = Ej. 



Ti^3 y; 



Ej^l,, i£ch{j). 



2. For j G {1,2,... , n}, compute column j of K via 

3. Iterate over j £ {1,2, ... ,n} in topological order. For each j, compute C/' and 
the jth column of Y from 



iech{j) 



y^ 


^. 




1 


■ 




\ K,, 


^l] 




[ 1 if, 1 


Yi.3 


-f^- 




[ ^h3 


1 




[ Ki,j 







[ I \ 



A improvement of step 2 is to use a factorization S"//. 

recursively, following a reverse topological order, as discussed at the end of Section HI 



-RjRj and update the matrices Rj 



5.4 Hessian factor 

Step 1 in the Hessian evaluation algorithm (Algorithm 15. ip is a linear mapping that transforms 
Y to K. Step 3 is a linear mapping that transforms M to T. It is interesting to note that these 
two mappings are adjoints. Step 2 implements a self-adjoint and positive definite mapping, which 
transforms K to M. Factoring the positive definite mapping in step 2 provides a factorization 

with TZ a linear mapping from S^ to Sy. The factorization of the mapping in step 2 can be 
implemented by defining a factorization Sji- = RjRJ with Rj upper triangular for each vertex j. 
Although it is impractical to pre-compute and store the matrices Rj for each vertex, they can be 
efficiently computed recursively in a reverse topological order, as in Algorithm 14.31 For the sake of 
clarity, we omit the details in the following algorithms. 

The algorithm for evaluating TZ consists of step 1 in Algorithm 15.11 and one half of step 2. 
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Algorithm 5.3. Evaluation of Hessian factor. 

Input. A matrix Y £ Sy, the Cholesky factors L, Z) of a positive definite matrix X G Sy , 
and the projected inverse S = V{X^^). 

Output. The matrix W = TZ{Y) where V. = IZ^^^ o 7^ is the Hessian of / at X. 

Algorithm. 

1. Iterate over j G {1, 2, . . . , n} in topological order. For each j, calculate the jth 
column of K^ and the update matrix [/• via 













1 ■ 


I 




[ -Li,, 


I 


\ 



Y ■ Y'^ 
Yj^, 



i€ch{j) 



2. For all j S {1, 2, . . . , n}, compute column j of W via 






-RjKj^j 



1 -Lj. 
/ 



jj 



where Rj is a triangular factor of Sj^j^ = RjRi ■ 



The algorithm for evaluating T^'^'^J consists of the second half of step 2 of Algorithm 15.11 and of 
step 3. It also readily follows by taking the adjoint of the calculations in Algorithm | 



Algorithm 5.4. Evaluation of adjoint Hessian factor. 

Input. A matrix W E Sy, the Cholesky factors L, D of a positive definite X G Sy, and 
the projected inverse S = V{X~^). 

Output. The matrix T = W'^^iW) where H = TZ^^ o 7^ is the Hessian of / at X. 

Algorithm. 

1. For all J G {1, 2, . . . , n}, compute column j of M via 

1 



M. 



Wn 



33 



D 



Mj^, = -^=R,Wj^r 

33 V ^jj 



2. Iterate over j £ {1, 2, . . . , n} in reverse topological order. For each j, calculate 
the jth column of T from 



Ti,3 y, 



1 -Lf,,. 

^]3 










1 
-Li,3 I 



and the update matrices VI for the children of vertex j via 



Vl = Ejr, 






Ejjl,, i G ch(j). 
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Figure 7: Sparsity pattern of the lower triangular part of W = 'R-{Y) when (a) Y has a single 
diagonal nonzero in position (2,2), marked with a solid square, and (b) when Y has an off-diagonal 
nonzero in position (4,2). The solid entries mark the nonzero elements in W. The gray markers in 
(b) correspond to nonzeros in W that are introduced by the scaling in step 2 of Algorithm 15.31 i.e., 
these entries are not present in the intermediate variable K. 



5.5 Sparse arguments 

In many applications, such as interior-point methods for the cone programs ([7]) mentioned in the 
introduction, the Hessian T-L = V^f{X) and its factorization Ti = TZ^'^^ o TZ are needed to compute 
coefficients 

H,j = Ai . {Vf{X)[Aj]) = n{A.i) . n{Aj) 

for m matrices Ai , . . . , Am € S^ . The matrices Ai are often very sparse relative to the sparsity pat- 
tern V . In this section, we examine the implications of sparsity in the matrix Y on the computation 
oiW = TliY) using Algorithm 15.31 

From step 1 in Algorithm 15. 31 we see that if Yjj ^ 0, then Ki j and U',- are nonzero and dense. In 
step 1 these nonzeros are then further propagated via the recursion in topological order to all the 
columns indexed by ancestors of j. Therefore, if Yjj ^ 0, then Wj^,k 7^ are dense for all ancestors 
k of j. 

To see how an off-diagonal nonzero in Y affects the sparsity pattern of W , suppose that Yjj = 0, 
Yij 7^ for some i G Ij, and C/^ = for all k G ch(j). Then Wpq / for all ancestors q of j and 
p £ Jj n {i, z + 1, . . . , n}. Hence nonzeros in column j of Y create fill in the columns that correspond 
to ancestors of j. In Algorithm 15. 3|, we can therefore prune the elimination tree at node k if all the 
descendants of node k correspond to columns in Y with no lower triangular nonzeros. 

Figure [7] shows examples of the sparsity pattern of W when Y has a diagonal and an off-diagonal 
nonzero, respectively, for the pattern V in Figured) 



Numerical results To evaluate the benefits of exploiting additional sparsity in the argument, 
we have implemented and tested a version of Algorithm 15.31 that exploits sparsity in Y relative 
to V. In the experiment, we use as test data a set of randomly generated problems with sparsity 
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Sparsity pattern 




n 


Dense 

(seconds) 


Sparse 
(seconds) 


Ratio 


HB/platl919 




1919 


0.17 


0.05 


3.7 


HB/bcsstkl3 




2003 


1.40 


0.36 


3.9 


HB/lshp3025 




3025 


0.21 


0.05 


4.6 


Boing/nasa4704 




4704 


1.05 


0.27 


4.0 


TKK/g3rmt3m3 




5357 


1.43 


0.35 


4.1 


Schenk_IBMNA/c- 


-36 


7479 


0.37 


0.06 


6.1 


Wang/swangl 




10800 


7.05 


1.72 


4.1 


ACUSIM/Pres_Poisson 


14822 


17.65 


4.35 


4.1 


GHS.psdef/wathenlOO 


30401 


6.05 


1.38 


4.4 



Table 1: Average computational times (10 trials) for Algorithm 15. 31 with and without the technique 
for sparse arguments (columns 'sparse' and 'dense', respectively) when applied to sparse arguments. 
The sparse arguments are randomly generated with two lower-triangular nonzero entries in random 
positions. 

patterns from the University of Florida Sparse Matrix Collection. For each sparsity pattern, we 
generate ten sparse arguments with just two lower-triangular nonzero entries in random positions. 
Table [T] summarizes the average time required to compute W = TZ^Y) using Algorithm 15.31 with 
and without the techniques described in this section. For these very sparse arguments, pruning the 
elimination tree results in average speedups in the range 4-6, but in general the speedup depends 
both on the number of nonzeros and on the position of the nonzeros. 

Finally, we remark that additional computational savings can be made in step 2 of Algorithm 15. 31 
when TZ{Y) is needed for several arguments Y. Specifically, the triangular factors Rj of Sj^i^ = 
RjRj need only be computed once. 

6 Cliques and clique trees 

It is known that the performance of sparse Cholesky factorization algorithms on modern computers 
can be improved by combining groups of vertices into supernodes and applying block elimination to 
the corresponding columns. Several definitions of supernodes exist in the literature. In this paper, 
we define a supernode as a maximal group of columns of L (sorted, but not necessarily contiguous) 
that share the same nonzero structure. More specifically, if A^ is a supernode and j = max A^, then 
for all ke N, 

Jk = {NUlj)r\{k,k + l,...,n}. 

In the example of Figure [H the sets 

{1}, {2}, {3,4}, {5,9}, {6}, {7,8}, {10,11}, {12,13,14}, {15,16,17} 

form supernodes. (Another more common definition adds the requirement that the indices in A^ 
are contiguous |LNP93] : this can be achieved from a set of supernodes as defined above by a simple 
reordering.) In the context of multifrontal factorizations, the grouping into supernodes has the 
advantage that only one frontal matrix is required per supernode. This reduces the memory and 
arithmetic overhead incurred for the assembly of frontal matrices. Moreover, the block operations 
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allow us to replace matrix-vector operations (level-2 BLAS) with more efficient matrix-matrix 
operations (level-3 BLAS) [DCHD90]. 

Supernodes are closely related to cliques in the filled graph and the barrier algorithms described 
in [DVROSl IDV09J . which involve iterations on clique trees, can be interpreted as supernodal mul- 
tifrontal algorithms. We therefore start the discussion with a review of cliques and clique trees, 
and their connections with supernodes. 

6.1 Cliques 

A filled graph is also known as a chordal or triangulated graph. A clique is a maximal set of vertices 
that define a complete subgraph of the filled graph. Equivalently, a clique is a set of indices that 
define a dense lower-triangular principal subblock of L. Every clique Vl^ in a filled graph can be 
expressed asW = Ji, where i = minM^, the least element in W |BP93l proposition 2]. This follows 
from the fact that the index sets Jj define complete subgraphs, as noted in Section [2j Hence, if i 
is the lowest index in the clique W, then VF C Jj. Since W is maximal, we must have W = Ji. 
The vertex i = min W is called the representative vertex of the clique. Since there are at most n 
representative vertices, a filled graph can have at most n cliques. 

Efficient algorithms for identifying the representative vertices can be derived from the following 
criterion: the cliques are exactly the sets Jj for which there exists no j < i with Jj C Jj [LPP891 
proposition 3]. This follows from the characterization of cliques in terms of representative vertices. 
If Jj C Jj for some j < i, then Jj is certainly not a clique, since it is strictly included in another 
complete subgraph. Conversely, if Jj is not a clique, i.e., Ji C W for some clique W, and j is the 
representative vertex of W, then j < i and Ji C W = Jj. 

In the example in Figure [H the representative vertices are 1, 2, 3, 5, 6, 7, 10, 12, 15. The other 
vertices are not representative because the corresponding sets Jj are not maximal: 

J4 c J3, Js c J7, Jg c J5, Jii c Jio, Ji3, Ju, Jig, Jn c J12. 

The representative vertices are easily identified from the elimination tree and the monotone degrees 
of the vertices. It can be shown that a vertex j is a representative vertex if and only if 

|/j|> 141-1 VA:ech(j) (33) 

(see |PS90j ). To see this, recall from (HD that 4 C Jj and |/j| > |4| - 1 if /c e ch(j). Therefore, 
if \Ij\ = \Ik\ — 1 for some k S ch(j), then Jj = Ik C Jk- Therefore j is not a representative 
vertex because the complete subgraph defined by Jj is not maximal. Conversely, suppose j is not 
representative, i.e., Jj C J/ for some / < j. In particular, Lji / and therefore, I is a descendant of 
j in the elimination tree. Prom Theorem [2] (the set of inequalities (jlOp ). this implies that Jj C Jfc 
for all k in the path from I to j. In particular, 

Jj C JkQ Jj U {k} 
for the child k of j on this path. Therefore, Jj = Ik and \Ij\ = \Ik\ — 1. 

6.2 Supernode partitions 

By comparing the monotone degrees of the vertices in the elimination tree and their parents we 
can partition the vertices {1, 2, . . . , n} in sets Nj = {ii,i2, ■ ■ ■ ,ir} where ii = j is a representative 
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Figure 8: Two supernode partitions of the elimination tree in Figured) The representative vertices 
are shown as rectangles. The vertices joined by the paths shown with heavy lines form the sets N]^^ 
where k is the representative vertex on the path. The sets N^ are enumerated in Tabled 



clique vertex. The vertices ii, ^2, 
tree, and 
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, ir form a path from j to an ancestor ir of j in the elimination 

l/ij + l = |/i3|+2 = ••• = |4.|+r-l, 



or, equivalently, 



h. 



Ji2 = fe} U J, 



«3 



{Z2,i3,--- ,V-l} U Jj, 



(34) 



The sets A'^- are supernodes (in the definition given at the beginning of this section). In general, 
several such partitions exist. Two possible partitions for the elimination tree in Figure [Dare shown 
in Figure [8] and listed in Table [2j The representative vertices are shown as rectangles, and the sets 
Nj are the vertices on the paths shown with heavy lines. We note two important properties of the 



sets Nj-. 



The set A'^- is a subset of the clique represented by vertex j: Nj 
from (j34p which implies Jj C Jj if i G Nj and i ^ j. 



C J,.. 



This can be seen 



Define Aj = Jj \ Nj. Then we have i < min^j for all i G Nj. To see this, first note that if 
k £ Jj, then Lkj 7^ and therefore, k is an ancestor of j in the elimination tree (Theorem [T]) . 
If also k < i for some i G Nj, then k is on the path from j to i in the elimination tree. 
However, by definition of N. 



J' 



this means that k £ Nj. 



This result means that Nj U Aj is an ordered partition of the clique Jj, i.e., the elements of Nj 
have a lower index than the elements of Aj . (In |LPP891 IPS90J , the sets Nj and Aj are referred to 
as the new set new{K) and the ancestor set anc{K), respectively, where K is the clique K = Jj.) 
If Aj is nonempty, we refer to the vertex k = min Aj as the first ancestor of the clique Jj . The 
first ancestor can be identified from the elimination tree as the parent of the vertex i = ui&xNj. 
This follows from ()34p with i = ir and the fact that the parent of vertex i is the first element in 
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k 


Nk 


Ak 


1 


{1} 


{3} 


2 


{2} 


{3,4} 


3 


{3,4} 


{5,15} 


5 


{5,9} 


{15,16} 


6 


{6} 


{9,16} 


7 


{7,8} 


{9,15} 


10 


{10,11} 


{13,14,17} 


12 


{12,13,14} 


{16,17} 


15 


{15,16,17} 


{} 



k 




Nk 




Ak 


1 


{1} 






{3} 


2 


{2} 






{3,4} 


3 


{3,4} 






{5,15} 


5 


{5,9} 






{15,16} 


6 


{6} 






{9,16} 


7 


{7,8} 






{9,15} 


10 


{10,11} 






{13,14,17} 


12 


{12,13, 


14,15, 


16, 17} 


{} 


15 


{15} 






{16,17} 



Table 2: The two supernode partitions defined in Figure El The first columns of the tables are the 
representative vertices. Each clique Jk is partitioned in two sets as Jk = NkU Ak- The sets A^'^ for 
a partition of the vertices {1, 2, . . . , n} 

Jj greater than i. Note that while the first ancestor can be determined from the elimination tree 
and the sets A'^-, the rest of the sets Aj cannot be derived from the elimination tree but require 
knowledge of the clique Jj . 

The sets Nk and Ak for the two partitions in the example are listed in Table [2j 



6.3 Clique trees 

We can associate with the vertex partitioning in sets Nj a tree with the cliques Jj as its nodes. The 
root of the clique tree is the clique represented by the vertex j for which n £ Nj. The parent of the 
clique Jj is the clique which has as its representative the vertex k for which rain A j £ Nk- We will 
use the notation k = par(j) to denote that Jk is the parent of Jj in the clique tree. Figure [9] shows 
the clique trees defined by the partitions Nj in Figure [3 The clique tree satisfies the following key 
properties [PS90l p. 186] ;LPP89j : 



Aj C ^par(jr)- 



Indeed, let i 



min Ai 



ij be the first ancestor of clique Jj. By definition of the clique tree 
I € A'par(j). From the definition in (p4l) this implies that Jj C ^par(j) 
a complete subgraph of vertices with indices greater than or equal to i 



Since Aj defines 



we have Aj C Jj 



Therefore Aj C JparQ)- 



An element of Nj is in the clique Jk only if Jk is a descendant of Jj in the clique tree. 

We can show this by contradiction. Suppose i S Nj belongs to Jk and Jk is not a descendant 
of Jj in the clique tree. The sets A'^; form a partition of {1, 2, . . . ,n}, so if i G Nj and 
i G Jk for k 7^ j, then i £ Ak- By the previous property, this implies i G Jpar(fc)- We have 
par (A;) ^ j because Jk is not a descendant of Jj. Therefore i £ ^par(fc) ^-iid, again from the 
previous property, i £ Jpar(par(A:)) ^^^ i ^ ^par(par(A:))- Continuing this process recursively, we 
eventually arrive at the conclusion that i belongs to A^ where Jr is the root of the clique tree. 
However, this is impossible because Jr = Nr and A^ = 0. 

If an element of Nj is in Ak, then it belongs to all the cliques on the path between Jj and Jk 
in the clique tree. 
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Figure 9: The two clique trees corresponding to the partitions in Figure [8l The first index in the 
bottom row at each node is the representative vertex j of the clique Jj . Each clique Jj is partitioned 



in two sets J 



J 



Nj U Aj. The indices of the bottom row at each node form the sets A'^-; the indices 



in the top row form the set Aj. The parent of clique Jj is the clique Jk that includes the first 
ancestor of clique Jj in its A^^^ set. 

This follows by combining the first two properties. From the second property, if i E Nj and 
i G ^fc, then J^ is a descendant of Jj. Assume there are cliques Js, Jpav{s) on the path between 
Jk and Jj with the property that i £ Jg and i Jpar(s)- From the first property, this implies 
i £ Ng. But this contradicts i £ Nj, unless j = s, because Nj U A^s = if j 7^ s. 

Taken together, these three properties state that the cliques that contain a vertex i form a subtree 
in the clique tree. The root of the subtree is the unique clique Jk for which i G Nk- This is known 
as the induced subtree property of clique trees [BP93J . 

6.4 Clique tree algorithm 

To summarize the results of this section, we state a simple algorithm that identifies the representa- 
tive vertices of the cliques, generates a partition into sets Nk, identifies the first ancestors min^fc of 
the cliques, and determines the parent structure of the clique tree. The algorithm is due to Pothen 
and Sun [PS90l p.l85]. 

Algorithm 6.1. Clique tree algorithm. 

Input. An elimination tree and the monotone degree \Ik\, k = 1, . . . ,n. 

Output. The representative vertices, the partition in supernodes A'^-, the first ancestor 
of each clique, and the parent structure of a clique tree. 
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Algorithm. For i = 1, . . . , n: 

1. If \Ii\ > \Ij\ — 1 for all j G ch(i), then i is a representative vertex. Set Ni = {i} 
and k = i. Otherwise, choose a vertex j £ ch(i) with |Jj| = \Ij\ — 1, determine 
the representative vertex k for which j £ N^, and add i to Nk- 

2. For each j G ch(z), if j G A^; and I / /c, set par(/) := k and min A; := i. 

Note that in step 1, there may be several choices for the child vertex j, and these choices lead 
to different vertex partitions and different clique trees. The vertex i = 16 in Figm'eEl for example, 
has two children j that both satisfy |/j| = \Ij\ — 1. These two choices lead to the different vertex 
partitions in Figure [8] and the two clique trees in Figure M 

7 Supernodal multifrontal algorithms 

We assume there are I cliques, with representative nodes ii, ..., ii and that the sets Ni are 
contiguous. The supernodal algorithms are block versions of the multifrontal algorithms in which 
the scalar diagonal elements Xjj are replaced with dense principal blocks X^v at and the subcolumns 
Xj.j with dense submatrices X^-isf.. 

For a clique J^ we denote by ch( J^) the set of child cliques of J^ in the clique tree. This is not 
to be confused with ch(A:) (with a vertex k as argument), which refers to the children of the vertex 
k in the elimination tree. 

In this section we start with a supernodal version of the Cholesky factorization algorithm. We 
then give similar extensions of the primal and dual gradient evaluation algorithms. For the sake of 
brevity, we will omit the extensions of the other algorithms in Sections [SHSl which follow the same 
pattern. 

7.1 Cholesky factorization 

In the supernodal Cholesky factorization we factor X as X = LDL with D block-diagonal and 
L unit lower triangular. The matrix D has / dense diagonal blocks Djsi^Ni for i G {ii, . . . ,ii}. 
Corresponding with each clique, L has a diagonal block Ljy-N- = I and a dense submatrix LAiNi- 
The rest of the block-column indexed by Ni is zero. 

Algorithm 7.1. Cholesky factorization. 

Input. A positive definite matrix X G Sy and a clique tree for the sparsity pattern V. 

Output. The factors L, D in the Cholesky factorization X = LDL^ . 

Algorithm. Iterate over j G {ii,i2, ■ ■ ■ -.k} using a topological order of the clique tree. 
For each j, form the frontal matrix 



F3 



Fii F^^ 
F21 F22 



T 



Xn^n, X 
Xa.n^ 



Ji£ch{Jj) 



and calculate Dj^.j^., Lan, and the update matrix Uj from 

F>NjNj = Fu, LajNj = F2iDj^,^^, Uj = F22 - LajNjF)njNjLajNj- 
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As can be seen, only one frontal matrix is assembled per clique, a major advantage compared to 
Algorithm 13.11 Moreover, the main computation is the level-3 BLAS operation in the computation 
off/,. 

7.2 Gradients 

The supernodal counterpart of Algorithms 14.11 for computing the primal gradient or projected 
inverse is as follows. In this algorithm, Vj is a dense 'update matrix' defined as Vj = Sa A ■ 



Algorithm 7.2. Projected inverse. 






Input. The Cholesky factors L, D of a positive definite matrix X ^ S\ 

Output. The projected inverse S = V{X^^) = —Vf{X). 

Algorithm. Iterate over j G {ii,. . . ,i/} using a reverse topological order of the clique 
tree. For each j, calculate Sn^n, and Sa^n, from 



Sa., 



N, 



-VjLa^n,, 



Sn,n, 



Dl 



SAiNi^A.Nj 



and compute the update matrices 



(35) 



^i - Ej.a, 



SNjNj ^a,N, 



S 



Sa,n, 



V, 



EjjA,, Ji G ch(Jj). 



(36) 



The main calculation is the matrix-matrix product in (j35p which replaces the matrix-vector prod- 
uct (I23p in the multifrontal algorithm. 

The extension of Algorithm 14.21 for computing the dual gradient or the maximum determinant 
positive definite completion is as follows. 



Algorithm 7.3. Matrix completion. 

Input. A matrix S £ Sy that has a positive definite completion. 

Output. The Cholesky factors L, D of X = — V/*(S'), i.e., of the positive definite matrix 
X G S^ that satisfies V{X-^) = S. 

Algorithm. Iterate over j G {ii, . . . ,ii} using a reverse topological order of the clique 
tree. For each j, compute Dj\f.]\[ and La^n^ from 



^A,N, = -Vj ^Sa,n,^ Dn^n, = {Sn,n, + S^^^Za.n,) \ 
and compute the update matrices 



Vi = E 



JjA.i 






Ej^A,, Jj G ch(Jj). 



(37) 



(38) 
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As for the multifrontal completion algorithm with factored update matrices (Algorithm I4.3p . 
this algorithm can be improved by propagating a factorization of Vj and using (1380 to compute the 
factors of Vi from the factors of Vj. We mentioned in section [6^2] that for every clique, the vertices 
in Ni precede those in Ai. As a consequence, the matrix Ej Ai can be partitioned as 



Ej,A, 



En, 



AiCiNj 




E 





Aj,Ai\Nj 



Using this property in (p8|) we get 



Vi = E 



JjA.i 



Sn.Nj 
Sa,n, 



Vi 



E.L 



A, 



A B^ 



where A is a principal submatrix of S^N- of order \Ai n A'jl, and C consists of |^i \ Nj\ rows of 
the upper triangular factor Rj of Vj = RjRJ ■ By reducing C to square triangular form C = RQ^ 
using a series of Householder transformations, and a factorization A — B R R^^ B = RR we 
obtain the factorization V = RjRT as 



Ri 



R [R-^BY 
R 



7.3 Numerical results 

We apply the supernodal multifrontal algorithms to the test problems described in Section 14.31 
The left-hand plot in Figure [10] shows the CPU times for Algorithms 17.11 (Cholesky factorizatoin) 
and 17.21 (projected inverse or primal gradient). The right-hand plot shows shows the CPU times for 
Algorithms 17.21 and 14.11 (supernodal multifrontal and multifrontal projected inverse, respectively). 
From the first plot we see that the cost of evaluating the primal gradient is comparable to the cost of 
computing the Cholesky factorization. The second plot shows that the supernodal implementation 
of the primal gradient is substantially faster than the non-supernodal implementation, for all but 
a few of the small problems. 



8 Conclusions 

We have derived recursive algorithms for evaluating the values, gradients, and Hessians of the 
primal and dual barriers 

/(X) = -logdetX, US)= sup {-tv{SX)-f{X)), 



defined for sparse symmetric matrices AT, S" G Sy with a given sparsity pattern U, where V is 
a filled (or chordal) pattern. Our interest in these algorithms is motivated by their importance 
in interior-point methods for conic optimization with sparse matrix cone constraints [ADVIOJ . 
Similar algorithms can be formulated for closely related problems that arise in sparse semidefi- 
nite programming, for example, the matrix completion techniques used in primal-dual methods 
|FKMN00irNFF+03| . 
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Figure 10: Scatter plots of CPU times for supernodal algorithms applied to 128 test problems. 
The plot on the left shows that the cost of computing the Cholesky factorization and the projected 
inverse is approximately the same. The plot on the right shows that the supernodal multifrontal 
implementation of the projected inverse algorithm is faster than the multifrontal implementation 
for all but some small problems. 

Our goal was to formulate efficient barrier algorithms based on Cholesky factorization techniques 
for large sparse matrices and, specifically, the multifrontal algorithm that has been extensively stud- 
ied in the sparse matrix literature since the 1980s. The algorithms inherit many of the properties 
of the multifrontal method. This means that a wide range of known techniques from the sparse 
matrix literature can be used to further improve the algorithms. For example, tree parallelism 
and node parallelism are readily exploited in a multifrontal method [ADLOO.J . Relaxed super- 
nodes and supernode amalgamation techniques JDR831 IAG89) have also been shown to improve 
the performance. Other improvements include tree modifications |Liu88| and memory optimization 
techniques |Liu861 [GL06| . 

The starting point in this paper was the multifrontal Cholesky factorization algorithm. Similar 
algorithms can be derived from the other popular types of sparse factorization algorithms, such as 
the up-looking Cholesky factorization (used in CHOLMOD [CDHR08] ) or the left-looking Cholesky 
factorization (a blocked version of which is used in CHOLMOD's supernodal solver). It would be of 
interest to compare the performance of these algorithms with the multifrontal algorithms formulated 
in this paper. 
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